Constraints on Parallelism Beyond 10 Instructions Per Cycle

نویسندگان

John G. Cleary

Richard H. Littin

Murray W. Pearson

چکیده

The problem of extracting InstructionLevel Parallelism at levels of 10 instructionsper clock and higher is considered. Two different architectures which use speculation on memory accesses to achieve this level of performance are reviewed. It is pointed out that while this form of speculation gives high potential parallelism it is necessary to retain execution state so that incorrect speculation can be detected and subsequently squashed. Simulation results show that the space to store such state is a critical resource in obtaining good speedup. To make good use of the space it is essential that state be stored efficiently and that it be retired as soon as possible. A number of techniques for extracting the best usage from the available state storage are introduced.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Re-ordered Memory Operations on Parallelism

The performance effect of permitting different memory operations to be re-ordered is examined. The available parallelism is computed using a machine code simulator. A range of possible restrictions on the re-ordering of memory operations is considered: from the purely sequential case where no re-ordering is permitted; to the completely permissive one where memory operations may occur in any ord...

متن کامل

Performance Measures of Superscalar Processor

In this paper the author describes about superscalar processor and its architecture. A superscalar architecture is one in which several instructions can be initiated simultaneously and executed independently. pipelining allows several instructions to be executed at the same time, but they have to be in different pipeline stages at a given moment. Superscalar architectures include all features o...

متن کامل

Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well ...

متن کامل

A case for merging the ILP and DLP paradigms

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that ...

متن کامل

Design and Validation of a Simultaneous Multi-Threaded DLX Processor

| Modern day computer systems rely on two forms of parallelism to achieve high performance, parallelism between individual instructions of a program (ILP) and parallelism between individual threads (TLP). Superscalar processors exploit ILP by issuing several instructions per clock, and multiprocessors (MP) exploit TLP by running di erent threads in parallel on di erent processors. A fundamental...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Constraints on Parallelism Beyond 10 Instructions Per Cycle

نویسندگان

چکیده

منابع مشابه

Effects of Re-ordered Memory Operations on Parallelism

Performance Measures of Superscalar Processor

Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

A case for merging the ILP and DLP paradigms

Design and Validation of a Simultaneous Multi-Threaded DLX Processor

عنوان ژورنال:

اشتراک گذاری